Take-home Exercise 1: Application of Spatial Point Patterns Analysis to discover the geographical distribution of Grab hailing services in Singapore

Published

January 23, 2024

Modified

February 5, 2024

1.0 Introduction

1.1. Overview - Setting the Scene

Human mobility, the movement of human beings in space and time, reflects the spatial-temporal characteristics of human behavior. With the advancement Information and Communication Technologies (ICT) especially smart phone, a large volume of data related to human mobility have been collected. By using appropriate GIS analysis methods, these data are potentially useful in supporting smart city planning and management.

In Singapore, one of the important source of data related to human mobility is from Land Transport Authority (LTA) DataMall. Two data sets related to human mobility are provided by the portal, they are: Passenger Volume by Origin Destination Train Stations and Passenger Volume by Origin Destination Bus Stops. One of the limitation of these data sets is that their location are biased to either bus stops or MRT/LRT stations. In 2020, another very interesting human mobility data set called Grab Posisi was released by GRAB, one of the largest shared taxi operator in South-east Asia. There are two data sets been released and one of them is for Singapore.

1.2 Objectives

Geospatial analytics hold tremendous potential to address complex problems facing society.

In this study, we will be working on the following tasks:

  • Apply appropriate spatial point patterns analysis methods

  • Discover the geographical and spatial-temporal distribution of Grab hailing services locations in Singapore

1.3 Getting Started

In this take-home exercise, we will be using the following packages:

Package Name Description
arrow arrow for reading in parquet files
gifski gifski for rendering GIF animations
lubridate lubridate for working with datetimes
maptools

maptools for manipulating geospatial data

Note: Package maptools has been removed from CRAN repository. We would need to install from their archived repository.

tidyverse tidyverse for performing data science tasks such as importing, wrangling and visualising data
tmap tmap for creating thematic maps
raster raster for writing, manipulating, analyzing and modeling gridded spatial data
sf sf for handling geospatial data
sp sp for managing SpatialPointsDataFrame, SpatialLinesDataFrame and performing projection transformation
spatstat spatstat for performing point pattern analysis and deriving the kernel density estimation (KDE) layer
spNetwork spNetwork for performing spatial analysis on network

Let’s load in the packages using p_load() of pacman package.

pacman::p_load(arrow, gifski, lubridate, maptools, tidyverse, tmap,
               raster, reshape2, sf, sp, spatstat, spNetwork)

2.0 Data Acquisition

We will be using 3 data sets in this exercise:

Data Format Description Source
Grab Posisi Parquet Grab taxi location points either by origins or destinations Grab
Road Shapefile Road layer within Singapore excluding outer islands Geofabrik download server
Master Plan 2019 Subzone Boundary (No Sea) GeoJSON Singapore boundary layer excluding outer islands Data.gov.sg

2.1 Extracting Geospatial and Aspatial Data Sets

Start by creating a new folder labeled Take-home_Ex01. Within this folder, create a sub-folder named data. Inside the data sub-folder, create two additional sub-folders and rename them geospatial and aspatial respectively.

Unzip the malaysia-singapore-brunei-latest-free.shp.zip and MPSZ-2019.zip folders and place all files into geospatial sub-folder.

Place all files from GrabPosisi into aspatial sub-folder.

Tip

Did you observe that the file names from GrabPosisi are quite lengthy? Shortening them could make processing more convenient later on.

Alternatively, we can list.files() to get a list of filenames that contains .parquet extension.

3.0 Geospatial Data Wrangling

3.1 Data Aggregation: Importing and Combining Aspatial parquet files

# Use list.files to get a list of filenames that match the pattern
parquet_files <- list.files(path = "data/aspatial", pattern = "\\.parquet$", full.names = TRUE)

grab_data <- data.frame()

for (file_path in parquet_files) {
  grab_data <- bind_rows(grab_data, read_parquet(file_path))
}

3.2 Data Export: Writing DataFrame to RDS file

write_rds(grab_data, "data/rds/grab_data.rds")

3.3 Data Import

3.3.1 Importing Aspatial Data - Grab Posisi data in RDS format

grab_df <- read_rds("data/rds/grab_data.rds")
str(grab_df)
'data.frame':   30329685 obs. of  9 variables:
 $ trj_id       : chr  "70014" "73573" "75567" "1410" ...
 $ driving_mode : chr  "car" "car" "car" "car" ...
 $ osname       : chr  "android" "android" "android" "android" ...
 $ pingtimestamp: int  1554943236 1555582623 1555141026 1555731693 1555584497 1555395258 1554768955 1554783532 1554898418 1555593189 ...
 $ rawlat       : num  1.34 1.32 1.33 1.26 1.28 ...
 $ rawlng       : num  104 104 104 104 104 ...
 $ speed        : num  18.9 17.7 14 13 14.8 ...
 $ bearing      : int  248 44 34 181 93 73 82 321 324 31 ...
 $ accuracy     : num  3.9 4 3.9 4 3.9 ...

What we can observe is that the grab data contains 30329685 observations and 9 variables. Notice that pingtimestamp is in the wrong format. It should be in date/time format and not integer. We will need to convert the data type in the next section (Data Preparation).

3.3.2 Importing Geospatial Data - Road data in shapefile format

We can import geospatial data into RStudio using st_read() of sf package. Let’s try it now!

roads_sf <- st_read(dsn="data/geospatial",
                   layer="gis_osm_roads_free_1")
Reading layer `gis_osm_roads_free_1' from data source 
  `C:\kt526\IS415-GAA\Take-home_Ex\Take-home_Ex01\data\geospatial' 
  using driver `ESRI Shapefile'
Simple feature collection with 1759836 features and 10 fields
Geometry type: LINESTRING
Dimension:     XY
Bounding box:  xmin: 99.66041 ymin: 0.8021131 xmax: 119.2601 ymax: 7.514393
Geodetic CRS:  WGS 84
str(roads_sf)
Classes 'sf' and 'data.frame':  1759836 obs. of  11 variables:
 $ osm_id  : chr  "4386520" "4578273" "4579495" "4579533" ...
 $ code    : int  5113 5114 5122 5122 5122 5122 5141 5122 5122 5122 ...
 $ fclass  : chr  "primary" "secondary" "residential" "residential" ...
 $ name    : chr  "Orchard Road" "Jalan Bukit Bintang" "Jalan Nagasari" "Persiaran Raja Chulan" ...
 $ ref     : chr  NA NA NA NA ...
 $ oneway  : chr  "F" "F" "B" "B" ...
 $ maxspeed: int  50 0 0 0 0 0 0 0 0 0 ...
 $ layer   : num  0 0 0 0 0 0 -1 0 0 0 ...
 $ bridge  : chr  "F" "F" "F" "F" ...
 $ tunnel  : chr  "F" "F" "F" "F" ...
 $ geometry:sfc_LINESTRING of length 1759836; first list element:  'XY' num [1:2, 1:2] 103.83 103.83 1.31 1.31
 - attr(*, "sf_column")= chr "geometry"
 - attr(*, "agr")= Factor w/ 3 levels "constant","aggregate",..: NA NA NA NA NA NA NA NA NA NA
  ..- attr(*, "names")= chr [1:10] "osm_id" "code" "fclass" "name" ...

3.3.3 Importing Geospatial Data - Master Plan 2019 Subzone Boundary (No Sea) in shapefile format

mpsz_sf <- st_read(dsn = "data/geospatial", 
                layer = "MPSZ-2019")
Reading layer `MPSZ-2019' from data source 
  `C:\kt526\IS415-GAA\Take-home_Ex\Take-home_Ex01\data\geospatial' 
  using driver `ESRI Shapefile'
Simple feature collection with 332 features and 6 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 103.6057 ymin: 1.158699 xmax: 104.0885 ymax: 1.470775
Geodetic CRS:  WGS 84
str(mpsz_sf)
Classes 'sf' and 'data.frame':  332 obs. of  7 variables:
 $ SUBZONE_N : chr  "MARINA EAST" "INSTITUTION HILL" "ROBERTSON QUAY" "JURONG ISLAND AND BUKOM" ...
 $ SUBZONE_C : chr  "MESZ01" "RVSZ05" "SRSZ01" "WISZ01" ...
 $ PLN_AREA_N: chr  "MARINA EAST" "RIVER VALLEY" "SINGAPORE RIVER" "WESTERN ISLANDS" ...
 $ PLN_AREA_C: chr  "ME" "RV" "SR" "WI" ...
 $ REGION_N  : chr  "CENTRAL REGION" "CENTRAL REGION" "CENTRAL REGION" "WEST REGION" ...
 $ REGION_C  : chr  "CR" "CR" "CR" "WR" ...
 $ geometry  :sfc_MULTIPOLYGON of length 332; first list element: List of 1
  ..$ :List of 1
  .. ..$ : num [1:300, 1:2] 104 104 104 104 104 ...
  ..- attr(*, "class")= chr [1:3] "XY" "MULTIPOLYGON" "sfg"
 - attr(*, "sf_column")= chr "geometry"
 - attr(*, "agr")= Factor w/ 3 levels "constant","aggregate",..: NA NA NA NA NA NA
  ..- attr(*, "names")= chr [1:6] "SUBZONE_N" "SUBZONE_C" "PLN_AREA_N" "PLN_AREA_C" ...

We can observe that both roads_sf and mpsz_sf are currently using the WGS 84 geographic coordinate system.

Summary Points

After looking at the file structure and contents of the 3 datasets, we need to perform the following preparations:

  • grab_df

    • Converting data type

    • Extracting trip starting location

    • Converting aspatial data into geospatial data

    • Getting grab data points within Downtown Core

  • roads_sf and mpsz_sf

    • Performing projection transformations

    • Extracting study area

    • Creating owin object

    • Getting road layer within Singapore excluding outer islands

3.4 Data Preparation

grab_df

3.4.1 Preparing and Converting grab_df into grab_sf (sf tibble data.framedata)

# Converting data type using as_datetime()
grab_df$pingtimestamp <- as_datetime(grab_df$pingtimestamp)

# Checking first n rows of data frame using head()
head(grab_df)

grab_origins_sf <- grab_df %>% 
  group_by(trj_id) %>% 
  arrange(pingtimestamp) %>% 
  filter(row_number()==1) %>%
  mutate(weekday = wday(pingtimestamp,
                        label=TRUE,
                        abbr=TRUE),
         start_hr = hour(pingtimestamp),
         day = mday(pingtimestamp),
         date_col = date(pingtimestamp)) %>%
  st_as_sf(coords = c("rawlng", "rawlat"),
           crs = 4326) %>%
  st_transform(crs = 3414)

# Getting geometry details using st_geometry()
st_geometry(grab_origins_sf)
1
Converting data type for pingtimestamp from integer into datetime using as_datetime() from lubridate package
2
By default, the head() returns the first 6 rows
3
Converting grab_df into sf tibble data.frame using st_as_sf() and its location information
4
Transforming coordinates using st_transform()
  trj_id driving_mode  osname       pingtimestamp   rawlat   rawlng    speed
1  70014          car android 2019-04-11 00:40:36 1.342326 103.8890 18.91000
2  73573          car android 2019-04-18 10:17:03 1.321781 103.8564 17.71908
3  75567          car android 2019-04-13 07:37:06 1.327088 103.8613 14.02155
4   1410          car android 2019-04-20 03:41:33 1.262482 103.8238 13.02652
5   4354          car android 2019-04-18 10:48:17 1.283799 103.8072 14.81294
6  32630          car android 2019-04-16 06:14:18 1.300330 103.9062 23.23818
  bearing accuracy
1     248      3.9
2      44      4.0
3      34      3.9
4     181      4.0
5      93      3.9
6      73      3.9
Geometry set for 28000 features 
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 3628.243 ymin: 25198.14 xmax: 49845.23 ymax: 49689.64
Projected CRS: SVY21 / Singapore TM
First 5 geometries:

3.4.2 Creating ppp objects

grab_origins_ppp <- grab_origins_sf %>%
  as('Spatial') %>%
  as('SpatialPoints') %>%
  as('ppp')
summary(grab_origins_ppp)
Planar point pattern:  28000 points
Average intensity 2.473666e-05 points per square unit

Coordinates are given to 3 decimal places
i.e. rounded to the nearest multiple of 0.001 units

Window: rectangle = [3628.24, 49845.23] x [25198.14, 49689.64] units
                    (46220 x 24490 units)
Window area = 1131920000 square units

The presence of duplicates can affect the spatial point pattern analysis. Let us check if our grab_origins_ppp contains any duplicated points.

# Checking for duplicated points
any(duplicated(grab_origins_ppp))
[1] FALSE
plot(grab_origins_ppp)

3.4.3 Performing projection transformation

To perform projection transformation, we will use st_transform(). Additionally, we will also use st_geometry() to inspect the contents of roads_sf and mpsz_sf after the projection transformation.

roads_sf

# Transforming coordinates using st_transform()
roads_sf <- st_transform(roads_sf,
                           crs = 3414)

# Getting geometry details using st_geometry()
st_geometry(roads_sf)
Geometry set for 1759836 features 
Geometry type: LINESTRING
Dimension:     XY
Bounding box:  xmin: -434085.6 ymin: -23036.54 xmax: 1759022 ymax: 741993.8
Projected CRS: SVY21 / Singapore TM
First 5 geometries:

mpsz_sf

# Transforming coordinates using st_transform()
mpsz_sf <- st_transform(mpsz_sf,
                          crs = 3414)

# Getting geometry details using st_geometry()
st_geometry(mpsz_sf)
Geometry set for 332 features 
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 2667.538 ymin: 15748.72 xmax: 56396.44 ymax: 50256.33
Projected CRS: SVY21 / Singapore TM
First 5 geometries:

Now, we observe that both roads_sf and mpsz_sf are using the SVY21 projected coordinates system.

3.4.4 Extracting specific planning area - Downtown Core

Given the high computational demands associated with analyzing the entire region of Singapore, our focus will be specifically on the Downtown Core planning area. This targeted approach aims to explore spatial data points within the Central Business District areas.

# extract Downtown core planning area
sg_downtown_sf <- mpsz_sf %>%
  filter(PLN_AREA_N == "DOWNTOWN CORE")

# plot the mpsz with downtown core
tmap_mode("plot")
tm_shape(mpsz_sf) +
  tm_polygons() +
  tmap_options(check.and.fix = TRUE) +
tm_shape(sg_downtown_sf) +
  tm_fill(col = "pink") +
  tm_layout(main.title = "Map of Singapore by planning subzone",
            main.title.position = "center",
            main.title.size = 1.2) + 
  tm_view(set.zoom.limits = c(11, 16))

And here’s how the zoom in version of Downtown core planning area would look like

plot(sg_downtown_sf["SUBZONE_N"], main = "DOWNTOWN CORE", col = "pink")

3.4.5 Creating owin object - Downtown Core

We are creating an owin object to confine the analysis to Downtown Core planning area. After creating this object, we use the summary() to get a quick overview of its key features.

downtown_owin <- as.owin(sg_downtown_sf)
summary(downtown_owin)
Window: polygonal boundary
single connected closed polygon with 637 vertices
enclosing rectangle: [28896.26, 31980.09] x [27914.19, 31855.48] units
                     (3084 x 3941 units)
Window area = 5083640 square units
Fraction of frame area: 0.418
plot(downtown_owin)

3.4.6 Combining grab data points with study area - Downtown Core

grab_origins_dt_ppp <- grab_origins_ppp[downtown_owin]
plot(grab_origins_dt_ppp, main="DOWNTOWN CORE")

3.4.7 Getting grab data points within Downtown Core

To obtain grab data points within the Downtown Core, we can apply st_intersection() from the sf package to ensure exclusion of other planning areas and outer islands in the analysis.

sg_downtown_grab_sf <- st_intersection(grab_origins_sf, sg_downtown_sf)

3.4.8 Getting road layer within Singapore excluding outer islands

Similarly, to get the road layer within Downtown Core (excluding other planning areas and outer islands), st_intersection() from the sf package will be used.

sg_downtown_road_sf <- st_intersection(roads_sf, sg_downtown_sf)
sg_downtown_road_sf <- st_cast(sg_downtown_road_sf, "LINESTRING")

4.0 1st Order Spatial Point Pattern Analysis (SPPA)

Now, we will be performing 1st order spatial point pattern analysis (SPPA) using spatstat package.

4.1 Kernel Density Estimation (KDE)

We will employ Kernel Density Estimation (KDE) to visualize the spatial distribution patterns of the initial trajectory locations of Grab hailing services (Grab hailing service pickups). This analysis aims to identify regions exhibiting a high concentration of these starting trajectory locations. KDE is a statistical method that allows us to create a smooth representation of the data, helping us uncover areas with dense clusters of Grab hailing service pickups.

We will calculate the KDE using the bw.diggle bandwidth parameter and the default smoothing kernel - gaussian. As the default unit of measurement for SVY 21 is in meters, the resulting density values will be expressed in terms of points per square meter. To facilitate a more comprehensible interpretation, we will employ the rescale() to convert the measurement unit from meters to kilometers. This conversion ensures that the density values represent the number of points per square kilometer, providing a more intuitive understanding of the spatial patterns in the data.

par(mfrow=c(2,2))

# KDE with default values
kde_grab_origins_dt_bw <- density(grab_origins_dt_ppp,
                              sigma=bw.diggle,
                              edge=TRUE,
                              kernel="gaussian")

plot(kde_grab_origins_dt_bw, main = "KDE Grab Origins for Downtown Core")

# Rescalling KDE values
grab_origins_dt_ppp.km <- rescale(grab_origins_dt_ppp, 1000, "km")

# KDE with rescaled values
kde_grab_origins_dt_bw <- density(grab_origins_dt_ppp.km,
                              sigma=bw.diggle,
                              edge=TRUE,
                              kernel="gaussian")

plot(kde_grab_origins_dt_bw, main = "KDE Grab Origins for Downtown Core")

Upon comparing the two plots above, the differences in the legends become apparent. It is important to highlight that the resulting plots look the same, with the only differences being reflected in the data values specified in the legends.

4.1.1 Working with different automatic bandwidth methods

par(mfrow=c(1,4))
# Choosing automatic bandwidth
plot(density(grab_origins_dt_ppp.km,
             sigma=bw.CvL,
             edge=TRUE,
             kernel="gaussian"),
     main="Using bw.Cv")

plot(density(grab_origins_dt_ppp.km,
             sigma=bw.scott,
             edge=TRUE,
             kernel="gaussian"),
     main="Using bw.scott")

plot(density(grab_origins_dt_ppp.km,
             sigma=bw.diggle,
             edge=TRUE,
             kernel="gaussian"),
     main="Using bw.diggle")

plot(density(grab_origins_dt_ppp.km,
             sigma=bw.ppl,
             edge=TRUE,
             kernel="gaussian"),
     main="Using bw.ppl")

4.1.1 Working with different kernels

par(mfrow=c(1,4))
# Choosing kernels
plot(density(grab_origins_dt_ppp.km,
             sigma=bw.ppl,
             edge=TRUE,
             kernel="gaussian"),
             main="Using gaussian")

plot(density(grab_origins_dt_ppp.km,
             sigma=bw.ppl,
             edge=TRUE,
             kernel="epanechnikov"),
             main="Using epanechnikov")

plot(density(grab_origins_dt_ppp.km,
             sigma=bw.ppl,
             edge=TRUE,
             kernel="quartic"),
             main="Using quartic")

plot(density(grab_origins_dt_ppp.km,
             sigma=bw.ppl,
             edge=TRUE,
             kernel="disc"),
             main="Using Disc")

Since we did not observe any significant differences across the kernels, we will use the default Gaussian kernel. Thus, we will be using Gaussian kernel with bw.ppl() as our final KDE plot.

  kde_grab_origins_dt_bw_ppl <- density(grab_origins_dt_ppp.km, 
                                      sigma=bw.ppl, 
                                      edge=TRUE,
                                      kernel="gaussian")

plot(kde_grab_origins_dt_bw_ppl)

4.1.2 Display KDE Maps on OpenStreetMap

4.1.2.1 Convert to raster for tmap display

gridded_kde_grab_origins_dt_bw_ppl <- as.SpatialGridDataFrame.im(kde_grab_origins_dt_bw_ppl)
raster_kde_grab_origins_dt_bw_ppl <- raster(gridded_kde_grab_origins_dt_bw_ppl)
spplot(gridded_kde_grab_origins_dt_bw_ppl)

Let us take a look at the properties of raster_kde_grab_origins_dt_bw_ppl RasterLayer.

raster_kde_grab_origins_dt_bw_ppl
class      : RasterLayer 
dimensions : 128, 128, 16384  (nrow, ncol, ncell)
resolution : 0.02409239, 0.03079135  (x, y)
extent     : 28.89626, 31.98009, 27.91419, 31.85548  (xmin, xmax, ymin, ymax)
crs        : NA 
source     : memory
names      : v 
values     : -2.057805e-13, 2788.346  (min, max)

Notice that the crs property is NA.

4.1.2.2 Assigning projection systems

projection(raster_kde_grab_origins_dt_bw_ppl) <- CRS("+init=EPSG:3414 +units=km")
raster_kde_grab_origins_dt_bw_ppl
class      : RasterLayer 
dimensions : 128, 128, 16384  (nrow, ncol, ncell)
resolution : 0.02409239, 0.03079135  (x, y)
extent     : 28.89626, 31.98009, 27.91419, 31.85548  (xmin, xmax, ymin, ymax)
crs        : +proj=tmerc +lat_0=1.36666666666667 +lon_0=103.833333333333 +k=1 +x_0=28001.642 +y_0=38744.572 +ellps=WGS84 +units=km +no_defs 
source     : memory
names      : v 
values     : -2.057805e-13, 2788.346  (min, max)

Notice that the crs property is completed.

4.1.2.3 Display on tmap OpenStreetMap

tmap_mode('view')

tm_basemap('OpenStreetMap') +
  tm_shape(raster_kde_grab_origins_dt_bw_ppl) + 
  tm_raster("v") +
  tm_layout(legend.position = c("right", "bottom"), frame = FALSE) + 
  tm_shape(sg_downtown_sf) + 
  tm_borders() +
  tm_text("SUBZONE_N", size = 0.65)
tmap_mode('plot')

Interpretations of Spatial Analysis from KDE map🔎:

  • The specific subzones that have relatively higher concentration are Cecil, Raffles Place and Bugis. They are denoted by the darker green color

  • Subzones like Philip and Maxwell have relatively lower concentration. They are denoted by lighter yellow color

  • From KDE map, we can only infer if a particular area / subzone area is highly or lessly concentrated but not specific road segments

Despite experimenting with various Kernel Density Estimation (KDE) methods and parameters, the interpretability of the generated plots remain constrained by a fundamental limitation of KDE, that is, the inability to comprehensively account for the intricate structure of road networks. The complexity of urban road layouts is not effectively captured by traditional KDE approaches, leading to a less precise representation of spatial patterns. As a consequence, the visualizations may not offer a nuanced understanding of the spatial dynamics, particularly in urban environments with intricate road networks.

4.2 Network Kernel Density Estimation (NKDE)

As a response to the limitation of KDE, we turn to NKDE. Network KDE considers the constraints and pathways defined by the road network, addressing the shortcomings of KDE by offering a more nuanced and accurate depiction of the concentrated areas for Grab hailing service pickups within urban landscapes.

4.2.1 Visualizing the Geospatial data

Let us visualize the geospatial data first.

tmap_mode('view')
tm_shape(sg_downtown_sf) +
  tm_fill(col = "SUBZONE_N", alpha = 0.5) + 
  tm_shape(sg_downtown_road_sf) +
  tm_lines() + 
  tm_shape(sg_downtown_grab_sf) + 
  tm_dots()
tmap_mode('plot')

We can see that both grab data points and the road networks are confined to the Downtown core planning area. However, it is hard for us to tell which specific road segments are highly concentrated.

4.2.2 Preparing the lixel objects and Generating line center points

Let us prepare the lixel objects and generate the line center points first before we perform NetKDE. We will set the length of a lixel (lx_length) to 700 meters, and the minimum length of a lixel (mindist) to 350 meters.

# splitting the lines as lixels
lixels <- lixelize_lines(sg_downtown_road_sf, 
                         700, 
                         mindist = 350)

# extracting the center of lixels as sampling points
sample_pts <- lines_center(lixels)

4.2.3 Performing NetKDE

To identify hot spots of grab origins, we will be performing a simple NKDE with a quartic kernel and bandwidth of 300 meters.

# densities for the simple NKDE
nkde_simple <- nkde(sg_downtown_road_sf, 
                  events = sg_downtown_grab_sf,
                  w = rep(1,nrow(sg_downtown_grab_sf)),
                  samples = sample_pts,
                  kernel_name = "quartic",
                  bw = 300, 
                  div= "bw", 
                  method = "simple", 
                  digits = 1, 
                  tol = 1,
                  grid_shape = c(1,1), 
                  max_depth = 8,
                  agg = 5,
                  sparse = TRUE,
                  verbose = FALSE)

To obtain more readable results, we can multiply the obtained densities by 1000 to get the estimated numbers of grab data points per kilometer.

# adding the densities as new columns to the sampling
sample_pts$density <- nkde_simple
lixels$density <- nkde_simple

# rescaling to help the mapping
sample_pts$density <- sample_pts$density*1000
lixels$density <- lixels$density*1000

Finally, let us visualize the network kernel density estimate values.

tmap_mode('view')
tm_shape(sg_downtown_sf) +
  tm_fill(col = "SUBZONE_N", alpha = 0.5) + 
  tm_shape(lixels) +
  tm_lines(col="density") +
  tm_shape(sg_downtown_grab_sf) +
  tm_dots()
tmap_mode('plot') 

Interpretations of Spatial Analysis from NKDE map🔎:

  • Road segments that are darker in color have relatively higher density of grab pickups than road segments that are lighter in color
  • Similar to KDE map, the specific subzones that have relatively higher concentration are Cecil, Raffles Place and Bugis. They are denoted by the darker green color
  • Subzones like Philip and Maxwell have relatively lower concentration. They are denoted by lighter yellow color

4.3 Temporal Network Kernel Density Estimation (TNKDE)

Very often, when dealing with events on a network, it is common for things to happen over time periods. Temporal Network Kernel Density Estimation (TNKDE) helps us to understand not just where events occur on the network, but also when. This approach is able to provide us with a more detailed picture of how events unfold.

In this section, we would like to find out:

  • How are grab hailing service pickups distributed across the days?

  • Does the concentration of grab hailing service pickups differs across subzones and days?

4.3.1 Temporal Dimension

ggplot(sg_downtown_grab_sf) + 
  geom_histogram(aes(x = date_col), fill = "lightblue", binwidth = 1, color = "white") +
  scale_x_date(breaks = unique(sg_downtown_grab_sf$date_col), labels = unique(sg_downtown_grab_sf$date_col)) +
  labs(title = "Distribution of Grab Hailing services over 2 weeks",
       x = "Date",
       y = "Frequency") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1), plot.title = element_text(hjust = 0.5))

It seems that the bar plot illustrates a non-uniform distribution, suggesting that certain days are experiencing heightened or reduced demand, contributing to variations in the overall biweekly pattern of Grab hailing service pickups. Notably, 2019-04-12, 2019-04-17 and 2019-04-18 emerge as the top three days with the highest Grab hailing service pickups frequencies, indicating that there is an increase demand or activities on these specific dates. Conversely, 2019-04-14, 2019-04-20 and 2019-04-21 stand out as the bottom three days with the lowest service occurrences, suggesting a potential decrease in demand or activity during these periods.

4.3.2 Preparing leave one out cross validation

w <- rep(1,nrow(sg_downtown_grab_sf))
sg_downtown_grab_sf$Time <- sg_downtown_grab_sf$date_col
start <- as.POSIXct(min(sg_downtown_grab_sf$date_col), format = "%Y/%m/%d")
sg_downtown_grab_sf$Time <- difftime(sg_downtown_grab_sf$Time, start, units = "days")
sg_downtown_grab_sf$Time <- as.numeric(sg_downtown_grab_sf$Time)

cv_scores <- bws_tnkde_cv_likelihood_calc(
  bw_net_range = c(200,1100),
  bw_net_step = 100,
  bw_time_range = c(10,70),
  bw_time_step = 10,
  lines = sg_downtown_road_sf,
  events = sg_downtown_grab_sf,
  time_field = "Time",
  w = rep(1, nrow(sg_downtown_grab_sf)),
  kernel_name = "quartic",
  method = "discontinuous",
  diggle_correction = FALSE,
  study_area = NULL,
  max_depth = 10,
  digits = 2,
  tol = 0.1,
  agg = 10,
  sparse=TRUE,
  grid_shape=c(1,1),
  sub_sample=1,
  verbose = FALSE,
  check = TRUE)
knitr::kable(cv_scores)
10 20 30 40 50 60 70
200 -70.57674 -70.74650 -71.41831 -71.92215 -72.32111 -72.65038 -72.93034
300 -54.18005 -55.11196 -55.80120 -56.31779 -56.72682 -57.06440 -57.35143
400 -49.37225 -50.32364 -51.01893 -51.53972 -51.95203 -52.29228 -52.58158
500 -46.72133 -47.68459 -48.38357 -48.90684 -49.32105 -49.66285 -49.95345
600 -44.05158 -45.02121 -45.72338 -46.24897 -46.66502 -47.00833 -47.30022
700 -42.79765 -43.04995 -43.75432 -44.28161 -44.69900 -45.04343 -45.33628
800 -42.25632 -43.22859 -43.93303 -44.46033 -44.87772 -45.22216 -45.51500
900 -40.25435 -41.23169 -41.93853 -42.46759 -42.88637 -43.23194 -43.52575
1000 -40.40334 -41.38194 -42.08879 -42.61784 -43.03662 -43.38219 -43.67600
1100 -40.54406 -41.52340 -42.23028 -42.75933 -43.17811 -43.52367 -43.81748
optimal_value <- max(cv_scores)

# Get the row and column values of the optimal value
optimal_indices <- which(cv_scores == optimal_value, arr.ind = TRUE)

# Extract the row and column values
optimal_row <- rownames(cv_scores)[optimal_indices[, "row"]]
optimal_col <- colnames(cv_scores)[optimal_indices[, "col"]]

# Display the results
cat("Optimal value:", optimal_value, "\n")
Optimal value: -40.25435 
cat("Optimal meters:", optimal_row, "\n")
Optimal meters: 900 
cat("Optimal days:", optimal_col, "\n")
Optimal days: 10 

4.3.3 Performing TNKDE

According to the “leave one out cross validation” method, the optimal set of bandwidths is 900 metres and 10 days. We will use these bandwidths to perform a discontinuous TNKDE with a quartic kernel.

# choosing sample in times (every 10 days)
sample_time <- seq(0, max(sg_downtown_grab_sf$Time), 1)

# calculating densities
tnkde_densities <- tnkde(lines = sg_downtown_road_sf,
                   events = sg_downtown_grab_sf,
                   time_field = "Time",
                   w = rep(1, nrow(sg_downtown_grab_sf)), 
                   samples_loc = sample_pts,
                   samples_time = sample_time, 
                   kernel_name = "quartic",
                   bw_net = 900, bw_time = 10,
                   adaptive = TRUE,
                   trim_bw_net = 900,
                   trim_bw_time = 80,
                   method = "discontinuous",
                   div = "bw", max_depth = 10,
                   digits = 2, tol = 0.01,
                   agg = 15, grid_shape = c(1,1), 
                   verbose  = FALSE)

# creating a color palette for all the densities
library(classInt)
library(viridis)
all_densities <- c(tnkde_densities$k)
color_breaks <- classIntervals(all_densities, n = ncol(tnkde_densities$k), style = "kmeans")
# generating a map at each sample time
all_maps <- lapply(1:ncol(tnkde_densities$k), function(i){
  time <- sample_time[[i]]
  date <- as.Date(start) + time

  sample_pts$density <- tnkde_densities$k[,i]
  map1 <- 
    tm_shape(sample_pts) + 
    tm_dots(col = "density", size = 0.01,
          breaks = color_breaks$brks, palette = viridis(10)) + 
    tm_layout(legend.show=FALSE, main.title = as.character(date), main.title.size = 0.5)
  return(map1)
})
# creating a gif with all the maps
tmap_animation(all_maps, filename = "imgs/animated_map.gif", 
               width = 1000, height = 1000, dpi = 300, delay = 50)
Creating frames
======
=====
======
======
======
=====
======
======
=====
======
======
======
=====
======

Creating animation
Animation saved to C:\kt526\IS415-GAA\Take-home_Ex\Take-home_Ex01\imgs\animated_map.gif 

Note: If target directory specified for saving the GIF file does not exist, we need to create the directory before attempting to save the animated GIF. Check if there is an imgs folder, else create one.

knitr::include_graphics("imgs/animated_map.gif")

Interpretations of Spatio Temporal Analysis from TNKDE map🔎:

  • Road segments that are greener in color have relatively higher density of Grab hailing service pickups
  • The TNKDE map reveal subtle variations in the concentration of Grab hailing service pickups across different subzones and days over the observed two weeks
  • While the overall pattern indicates similarities between the NKDE map, the TNKDE map suggest that the Grab hailing service pickups differs slightly from day to day